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© This invention features vectors and a method for 
sequencing DNA. The method includes the "steps of: 

a) ligating the DNA into a vector comprising a 
tag sequence, the tag sequence includes at least 15 
bases, wherein the tag sequence will not hybridize to 
the DNA under strmoent hybridization conditions and 
is unique in the vector, to form a hybrid vector. 

b) treating tne hybrid vector in a plurality of 
vessels to produce fragments comprising the tag 
sequence, wherein the fragments differ in length and 
terminate at a fixed known 'base or bases, wherein 
the fixed known base or bases differs in each vessel. 

c) separating the fragments from each vessel 
according to their size, 

d) hybridizing the fragments with an 
oligonucleotide able to hybridize specifically with the 
tag sequence, and 

e) detecting the pattern of hybridization of the 
tag sequence, wnerein the pattern reflects tne 
nucleotide sequence of the DNA. 
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© Multiplex sequencing. 

© This invention features vectors and a method for sequenc- 
ing DNA. The method includes the steps of: 

a) ligating tne DNA into a vector comprising a tag 
sequence, the tag seauence includes at least 15 bases, 
wherein the tag seauence will not hybridize to the DNA 
unaer stringent nybndization conditions and is unique in 
the vector, to form a hybrid vector. 

b) treating the hybrid vector in a plurality of vessels to 
produce fragments comprising the tag sequence, wnerein 
the fragments differ in lenctn and terminate at a fixed 
known oase or -bases, wherein the fixed known base or 
bases differs in eacn vessel. 

c) separating the fragments from each vessel accorcmg 
to their size. 

d) hyonai2ing the fragments witn an oligonucleotide aole 
to hypndize specmcaiiy witn tne tag sequence, and 

e) detecting tne pattern of hyondization of the tag 
sequence, wherein the pattern reflects the nucleotide 
sequence of the DNA. 
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Description 



MULTIPLEX SEQUENCING 



This invention relates to sequencing of ONA. 

The sequence of nucleotide bases in ONA is 
generally determined using the methoas described 5 
by Maxam and Gilbert ;65 Methods Enzymol . 497. 
1930) or by Sanger et al. (74 Proc. Natl. Acad. Sci. 
USA 5463. 1977)". These methods generally involve 
mTlsolation of purifiec fragments of DNA prior to 
sequence determination. 70 

Church et al. (81 Pro:. Natl. Acad. Set. 1991 . 1SS4) 
described a method of sequencing airectly from 
genomic DNA. Unlabeled DNA fragments are separ- 
ated in a denaturing ;e! after complete restriction 
endonuclease digestion and partial chemical cleav- 75 
age of the genome. After binding these fragments to 
a nylon membrane the DNA is hybridized' with a 
orobe comprising RNA Homologous to a region near 
:o the region to be sequenced. The memorane can 
be reorobed with other probes to sequence other 20 
-eoions of interest. 



Summary :f :he Inventior 



25 



40 



In a first aspect, tr.e invention features a set 
comprising at least twe vectors. Each vector of the 
set has a DNA construct having at least one 
restriction endonuclease site. Further, each vector 30 
of the set differs from each other vector of the set 
only at a tag seauence. "he tag sequence includes at 
least 15 base Dairs anc =s located within 50 bases of 
a restriction endonuc:=ase site. Further, each tag 
sequence in the set wti. not hyoridize unaer stringent 35 
hybridization conditions to another tag sequence in 
the set. By strjngen: hybridization conditions is 
meant low enough sal: rcuntenon concentration and 
high enougn temperature to melt mismatched 
duplexes, to form sing;e stranded molecules (e.g.. at 
42 C C in 500-1000 mM sodium phosphate buffers). 
3y vector is meant any fragment of DNA. whether 
linear or circular, whicn can be ligated to DNA to be 
seauenced. Such vectcs may include an origin of 
DNA replication. 45 

In a related aspect tr.a invention features a vector 
for sequencing DNA. Tr.e vector comprises a DNA 
construct having two restriction endonuclease sites, 
each site being recognised by a first restriction 
endonuclease. wherein Treatment of the vector with 50 
this first endonuclease produces a discrete DNA 
fragment consisting of :ne DNA extending between 
the sites. The vector further incluaes two tag 
seauences both being Seated on the DNA fragment, 
and separated from each other by a second 55 
restriction endonuclease site. The tag seauences 
include- at least 15 base oairs ana both enas of the 
tag sequences are located within 50 bases of the 
nearest first restriction site and within 50 bases of 
the second restriction endonuclease site. The tag 60 
sequences are unique m the vector. By unique is 
meant that there are no other nucleotide sequences 
in the vector which exactly correspond to the tag 



sequences. 

In preferred embodiments of these asoects the 
tag sequences have two strands, one of which is 
free from cytosine residues; the vectors are pro- 
duced from a parental vector having randomly 
formed tag sequences ligated into it: and the 
randomly formed sequences are formed in a DNA 
synthesizer supplied with at least two nucloetides at 
each addition step. 

In another related aspect, the invention features a 
set of tag oligonucleotides which bind to the tags in 
the vector, wherein each tag "oligonucleotide of the 
set is unique in the set and neither tag oligonucleo- 
tide nor an oligonucleotide homologous to the tag 
oligonucleotide will hybridize under stringent hy- 
bridization conditions to DNA in the set. Each such 
oligonucleotide is at least 15 bases in length. 

In preferred embodiments the set of vectors 
described above is produced by ligation into a 
parental vector of this set of tag oligonucleotides. 

In a second aspect, the invention features a 
method for sequencing a DNA specimen, termed 
multiplex sequencing. The method includes the 
steps of: 

a) ligating the DNA specimen into a vector 
comprisina a tag sequence, to form a hybrid 
vector. The tag sequence includes at least 15 
bases, will' not hybridize to the DNA soecimen 
under stringent conditions, and is unique m the 
vector: 

b) treating separate aliquots of the nyDrtd 
vector in a plurality of vessels to procuce 
fragments each comprising the tag sequence. 

„ These fragments in each vessel differ in length 
from each other and all terminate at a fixed 
known base or bases (e.g.. A. T. C.or^G for 
Sanger dideoxy sequencing: G. A * G. T -^C. 
T or C for Maxam and Gilbert sequencing}. The 
fixed known base or bases differ from these in 
other vessels: 

c) separating the fragments from each vessel 
according to their size: 

d) hybridizing the fragments with an oligonu- 
cleotide able to hybridize specifically with the 
tag sequence: and 

e) detecting the pattern of hybridization of 
the oligonucleotide: this pattern reflects the 
nucleotide sequence of the DNA specimen., 

in preferred embodiments, the metnod furtner 
includes the step of ligating the DNA specimen ro a 
plurality of vectors to form a plurality of riyond 
vectors; each vector differs from each of the otr.ers 
at a tag sequence and each tag sequence is unaoie 
to hybridize under stringent conditions to other tag 
sequences: the method further induces me ssec of 
renybndizing the fragments with eacn oligonucleo- 
tide corresponding to each iag seauence of eacn 
vector; each vector includes two tag seauences: tne 
method further comprises binding the fragments to 
a solid support prior to the hybridizing step: the 
method further includes, prior to the separating 
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siep. providing a molecular weight marker in one 
vessel ana detecting this marker after the seDarating 
step; and the solid support comprises the vector or 
the oligonucleotide, wherein the vector or oligonu- 
cleotide acts as an identifying marker for the 
support. 

In a thirc asoect. the invention features a method 
for repeatecly nybricizing a solid support, compris- 
ing nucieic acia. with a plurality of labels, inciuding 
the steps of: 

a) enclosing the support within a container 
comprising material thin enough to allow detec- 
tion of hybridization of nucieic acid with the 
labels. 

b) insentng hybridization fluid comprising a 
first labei into the container. 

c) removing the hybridization fluid from the 
container, and 

d) detecting hyoridization of the first label to 
the solid support, while the solid support 
remains in the container. 

In preferrea embodiments the method further 
comprises, after the oetecting steo. repeating steps 
b. c and d using a secona label: and most preferably 
comprises reoeating steps b. c and d at least 20 
times with at least 20 labels. 

In a fourth asDect. tne invention features a method 
for determining the DNA sequence of a DNA 
specimen, mcluaing tne steps of: 

a} performing a first sequencing reaction on 
the DNA specimen: 

b) performing a second sequencing reaction 
on a known DNA specimen; 

c) placing corresponding products of the first 
and secona sequencing reactions in the same 
lanes of a DNA sequencing gel : 

d) running the products into the gel; 

e) detecting the location of the products in 
thegeLana 

f) using the location of the known DNA 
products to aid calculation of the DNA se- 
quence of tne DNA specimen. 

In a fifth aspect, the invention features an 
automated hyoridization device, for repeatedly hybri- 
dizing a solid support containing nucleic acid, with a 
plurality of labels. The device includes a container 
enveloping the solid support, formed of material thin 
enougn to allow detection of hybridization of the 
nucleic acid with, a label by placement of a 
label-sensitive film adjacent the container. Also 
included is an inlet and outlet for introducing and 
withdrawing hybridization fluid and means for auto- 
matically reguiating introauction and removal of 
hybridization fluid. 

In preferred embodiments the automated hybridi- 
zation device includes infiataole means for contact- 
ing the container with the film: a lightproof box 
enveloping the container: ana a plurality of contai- 
ners, each seoarated by a taDei-ODaaue shelf. 

Multiplex seauencing significantly increases the 
speed'at wmcn DNA seauencing can be performed. 
Specifically, it reauces the .experimental time for 
preparation of the DNA for sequencing, for the base 
specific chemical reactions (when using the Maxam 
and Gilbert methodology) and for the Sanger 



dideoxy reactions, for the pouring and running of 
sequencing gels, and for reading the image and 
sequence data from autoradiographs of these gels. 
The above vectors are specifically designed for 

5 multiplex sequencing having unique DNA seauences 
positioned appropriately such that each sequence 
provides a specific probe region for DNA inserted 
into the vector, and indeed for each strand of the 
inserted DNA. Thus, any DNA inserted into these 

W vectors is readily sequenced, as a pool of such 
inserts. 

Other features and advantages of the invention 
will be apparent from the following description of the 
preferred embodiments and from the claims. 

;5 

DescriDtion of the Preferred Embodiments 



20 The drawings wil first briefly be described. 
Drawings 

Figure 1 is a diagrammatic recresentation of a 
25 multiplex vector. 

Figure 2 is the nucleotide seauence of part of 
a multiplex vector, including the tag seauences. 

Figure 3 is the nucleotide seauences of a set 
of tag sequences lacking G residues. 
30 Figure 4 is a schematic diagram representing 

the steps in the method of multiplex seauenc- 
ing. 

Figure 5 is a diagrammatic representation 
showing data transformations from raw data to 
35 ideal data. 

Figure 6 is an isometric view of a container for 
automated probing of membranes. 

Figure 7 is a diagrammatic representation of 
components of an automated probing device. 

Multiplex Vectors 

Multiplex vectors are used to form a set of vectors 
suitable for multiplex sequencing. The vectors are 

45 provided with a DNA sequence having a) a cloning 
site, b) at least one tag sequence and c) optional 
removal sites. The cloning site is a position at which 
the DNA specimen to be sequenced is piaced. 
usually it is a restriction endonuclease site. The 

50 vectors are provided with at least one. ana prefer- 
ably two. tag sequences, or probe hyoridization 
sites. A tag sequence is an unique DNA sequence of 
between about 15-200 base pairs, preferably greater 
than 19 base pairs in length, which will act to 

55 specifically identify each vector in hybridization 
tests. These sequences preferably nave no homo- 
logy elsewnere on the vector and no homology to 
the genomic DNA to be sequencea. Thus, positive 
hybridization using a oart or the wnole of a tag 

SO sequence as a probe for a mixture of multiplex 
vectors specifically identifies the muitiDlex vector 
containing this tag sequence. Further, since it is 
preferable that no two tag seauences are the same 
on a single vector, and since each stand of any one 

65 tag sequence is unique, a specific strand of the 
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vector can also be specifically identified. Removal 
sites are useful when using the vectors for Maxam 
and Gilbert type sequencing. Generally, these sites 
are "restriction endonuclease sites, preferably acted 
upon by the same restriction enzymes, and allow 
removal of a fragment cf ONA, including the tag 
sequences and the DNA specimen'to be sequenced. 

The tag sequences are oositioned close (prefer- 
ably within 50 bases) to the cioning site, such that 
the DNA specimen to be seauenced can be placed 
between the two tag seauences. or at least 
downstream from one tag sequence. Further, the 
tag sequences are preferably positioned between 
two identical restriction enaonuclease sites (remo- 
val sites) so that the clonea DNA and tag sequences 
can be readily isolated as a single fragment by 
digestion with the specific endonuclease. 

Those skilied in the art will recognize that vectors 
having a single tag sequence may be used in this 
invention, or even vectors having more than two tag 
sequences. Such multi-tag vectors will preferably 
have multiDie cioning sites for accepting different 
DNA fragments to be sequenced. 

Examote l: TcEN vectors 

An examoie of such a set of vectors is shown in 
Fig. 1. In this set of vectors, termed TcEN vectors, 
each vector has its own pair of unique tag 
sequences (labelled PHS) separated by Sma l site 
(the cloning site} and surrounded by Notl sites 
(removal sites). The unique tag sequences are 
generated by synthesizing oligonucleotides approxi- 
mately 20 nucleotides in length in a DNA synthesizer, 
with all 4 nucleotides provided at each polymeriza- 
tion step. All 4 20 possible 20-mer nucleotide se- 
quences are produced oy this procedure. The 
oligomer mixture was ligated into a parent TcEN 
plasmid to form constructs containing the sequence 
( Not l restriction site) - (Oligomer tag sequence 
1) - ( Sma l site) - (oligomer tag sequence 2) - ( Not l 
site) using stanaard procedures. The parent plasmid 
and each TcEN plasmid contain a tetracycline 
resistance gene which may be used for selection of 
clones containing these vectors. Also provided is an 
origin or replication (ori) which functions in Esheri- 
chia coli . 

Many recomDinant vectors from the original mass 
ligation were cloned ana sequenced, and forty-six 
different vectors (containing 92 unique oligomer tag 
seauences) were chosen for initial experiments on 
the basis of their having the complete tag sequence- 
containing construct and sufficient sequence diver- 
sity to insure uniqueness. Probe oligomers (tag 
probes) complementary to each of the. tag sequen- 
ces have been preparea synthetically and radioiso- 
topicaily labelled. Other laoels are equally suitable: 
e.g.. fluorescent, luminescent and enzymatic (colo- 
rimetric) labelled probes. 

Examoie 2: NoC vectors 

A major problem with both dideoxy and chemical 
sequencing is the formation of hairpins during 
electropnoretic separation of the sequence reaction 



products. The hairpins often cause two or more base 
positions to comigrate on the gels causing an 
artifact wnich can be hard to notice or decipher. Ore 
partial solution, sequencing of the opoosite nucieo- 
5 tide strand, can fail if there are tandem or alternative 
hairpins, or if the opposite strand has any other 
artifacts. Other methods, including the use of not 
formamide oels, dITP. or dc 7 GTP. 

Cytosines can be chemically modified so that they 
jo will not base pair with guanines. Although cytosme 
chemical modification eliminates hairpins, (since at 
least two GC base pairs are required to stabilize a 
hairpin in sequencing gels containing 7M urea at 
50° C). They also prevent the GC base pair formation 
;5 required tor most tag probes to be able to bind :o 
identify specific vectors later in the sequencing 
process. This problem is overcome by preparing a 
set of NoC vectors having tag seauences whicr, 
completely lack cytosines on one strand (any 
20 guanines on the complementary strand) for at least 
15 nucleotides in a row (Figs. 2 & 3). These vectors 
generally have two such oligonucleotide tag sequen- 
ces flanking each cloning site. 

Referring to Fig. 2. the NoC vectors were aenve= 
25 from the TcEN vectors by substituting the seauence 
shown at the EcoRI sites. H26 represents vano-s 
combinations of 26 A. C, and T nucleotides Drese-: 
on the strand at this point. D25 represents 25 A. 3. 
and Ts. These were chemically synthesized as a 
30 mixture (as described above in example 1) ar.o 
specific examples were cloned and sequenced (see 
Fig. 3). The object is to have No Cytosines on :re 
appropriate strand in the tag sequence which can ce 
modified bv chemical hairpin suppression reactions. 
35 Referring to Fig. 3. the sequences of the tag 
sequence of the NoC muitioiex vectors of Fig. 2 a-e 
given. The first twenty nucleotides have also Deen 
synthesized as probes. The seauences are aii 
oriented 5' to 3'. left to right. The 5' terminal 
40 nucleotides shown are the 3'-most cytosines of re 
Notl sites. The letter P indicates that tne :a; 
Sequence is closest to the Pstl site: the letter E tr.s: 
the tag seauence is closest to the EcoRI site. Tne 
numbers are the vector and tag sequence (or tag 
45 probe) numbers: number 00 is the standard plasma 
used to provide an internal control of known 
sequence. The Notl site adjacent to the Pstl site nas 
one C deleted so that Notl cleavage of this vectc- 
and probing with tag seauence E00 gives a 25" 
50 base long set of sequence markers (3' ena-iabesez 
as with the unknown sequences). 

These vectors also incorporate bacterial tran- 
scription terminator seauences flanking the tag 
sequences, to suppress excessive transcnDtirj- 
=5 across the boundary between foreign DNA and ;re 
vector DNA and thus promote efficient reciicat:" 

Multiplex sequencing metnod 

SO Multiplex seauencmg is a method for keening = 
large set of DNA fragments as a precise mixture 
throughout most of the steos of DNA seauencmg 
Each mixture can be handled with the same effort as 
a single sample in previous methods, ana so a 

65 greater total number of fragments can be nanaiec 
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,vitnm a fixed time period. The mixture must be 
deciphered at the end of the process. This is done by 
•agamg the fragments at the beginning of the 
-rocess with unique DNA sequences (tags) and 
•nen. at the end of the process, hybridizing com- 
plementary nucleic acid (probes) to the sequencing 
reactions which have been spread out by size, and 
immobilized on large membranes. Many different 
• sequences are obtained from each pool by hybridiz- 
ing with a succession of end-specific, strand-spe- 
cific probes. Thus, tn this procedure DNA purifica- 
tions, base specific reactions and gel loadings 
normally done on individual fragments are replaced 
by operations on pools of fragments (e.g., about 46 
fragments per pool). In general, the steps in the 
process are cloning of genomic DNA into a set of 
multiplex vectors, mixing one clone derived from, 
each vector to form a pool of vectors, performing 
sequencing reactions on each pool of such vectors, 
running these reactions on a sequencing gel. 
binding the DNA fragments in the gel to a solid 
support, and probing this support with tag probes 
specific for each vector to identify the DNA se- 
quence of any one cloned fragment. 

This methodology can be usea to sequence DNA 
in combination with any standara sequencing proce- 
dure, for example, nested deletions (Henikoff, 28 
Gene 351. 1934), cDNA (Okayama et al. 2 Mol. Cell. 
Biol. 161, 1931), shotgun (Sanger et al. 162 J. Mol. 
Biol. 729, 1982) or ONA hybridization techniques; 
and can be combined with amplification methodo- 
logy, for example, the polymerase chain reaction 
system of Saiki et al. (230 Science 1350, 1985.) 

Example 3: Sequencing Escherichia coli genomic 
DNA 

In this example, the steps of the multiplex 
sequencing process are described below, with 
reference to Fig. 4. 

Step 1 

Genomic DNA (or equivalent large DNA frag- 
ments) of Escherichia coli was sontcally fragmented 
and electropnoreticaily separated out for further 
processing. The fraction containing fragments of 
800-1200 bp in length was isolated, since this size is 
optimal for sequencing. This treatment assures that 
virtually alt possible DNA sequences from the 
original DNA are represented m a fraction of 
convenient molecular size for furtner processing. 

Step 2. 

The sonically fragmented and sszed fraction was 
treated with the enzymes Bal3l. ana T4 polymerase, 
to proauce blunt-enaed fragments. Aliquots of the 
blunt-endea fragment mixture were then tigaiea 
separately into each of the set of 46 TcEN multiplex 
vectors and cloned to produce ^5 different libraries. 
Each library contains clones of a large representa- 
tive sample from the original sonic fraction. All 
fragments in any library contain the tags derived 
from the cloning vector used to produce the library. 



one at each end of each strand. 

Each library was then spread or "plated" onto an 
agar plate under limiting dilution conditions (conoi- 
tions such that individual bacteria will be isolated 

5 from each other on the ptate. and will form separate 
clonal colonies). The plates contained tetracycline 
so that only cells which have acquired a TcEN 
plasmtd. with its tetracycline-resistance gene will 
survive. There were 46 such plates, and about 100 

10 colonies were established per plate. After a period to 
allow the individual bacteria to grow into colonies, 
one clone (i.e., one colony, which will contain 
offspring of a single bacterium) from each of the 46 
plates was then mixed together to form a pooled 

15 sample. This pooled sample contained a single 
clone, containing a single approx. 1 kb fragment, 
from each of the 46 libraries --that is, 46 different 1 
kb fragments of the original DNA sample. Each 
fragment is labeled at each end with "the tag 

20 oligomers unique to the library from which it was 
produced. 96 such pooled samples were made. 

Step 4. 



25 Each of the pooled samples were treated via the 
Maxam-Giibert nucleotide sequencing procedure to 
produce a large number of DNA fragments of 
differing length. Each fragment will contain one of 
the tag sequences at one end. Some of these 

30 chemical sequencing reactions were performed in 
standard 96 well microliter plates. The usual alcohol 
precipitation steps were eliminated. This was done 
by simply adding 10ixl of dilute dimethyl sulfate (1 
oiM), acetic acid (15 mM) or 0.1 mM potassium 

35 permanganate (for G + C. A-i-G. and T reactions, 
respectively) to 5ul of DNA followea by 50ul of 1.3M 
piperidine. The reactions, lyophiiization, resuspen- 
sion, and loading were all done in the 96-weil piates 
using multi-tipped pipets special designed for the 

40 8x12 well format. 

The pooled, reacted samples were then eiectro- 
phoretically separated on sequencing gels by con- 
ventional means. Typically, the four sequencing 
reaction mixtures of twenty-four different pool 

45 samples were applied in separate lanes on each gel. 
The separated nucleotide patterns were then trans- 
ferred to a nylon membrane, and fixed to the 
membrane (other solid media are equally suitable), 
as described by Church & Gilbert, supra . 

50 

Step 5. 

Each nylon membrane, with each lane containing 
overlapping sequencing-fragments of 46 different 1 

55 kb fragments, was then hybridized with one of the 92 
probe oligonucleotides, each one of which is 
complementary of one of the tag sequences. 
■ Unreacted tag prooe was then wasnea oft with a 
7-fold dilution of the hybridization buffer at 23' C. ana 

50 the autoradiographic pattern of the treated gel 
recoraed. Only those sequencing fragments con- 
taining the complementary tag sequence will hy- 
bridize with any given probe oligonucleotide, ana 
thus be detectable by autoradiography. Hybridized 

65 probe was then removed, and the procedure 



5 



EP 0 303 459 A2 



10 



repeated with another tag probe oligonucleotide. 
The nucleotide pattern fixed on the nylon memDrane 
can be reDeatedly hybridized in 7<>/o SDS 770 mM 
sodium Dhosphate. pH 7.2 at 42° C. with a tag probe 
oligonucleotide, excess probe washed off. auto- 
radiographed, then washed free of probe (by melting 
the 20 base pair ON A duplexes at 65 C C in low salt 
(2mM tris-EDTA-SDS)). and reacted with another 
tag probe. Over 45 successive probings have ceen 
achieved. There is no reason to expect that this is 
the limitation on the number of successive prooings 
possible on one membrane. Each tag probe oligonu- 
cleotide will detect only those sequencing fragments 
derived from whichever of the 46 original 1 kb 
fragments was cloned with the complementary tag 
probe oligonucleotide. 
- The hybridizations, washes and exoosures were 
performed in an automated probing device (see 
below) with the membrane sealed in a container 
made of thin plastic' (Scotchpak® it 229. about 60 
urn thick). The solutions were introduced through a 
tube to the inside and the X-ray film pressed tightly 
against the elastic layer by applying pressure ie.g.. 
3mm He air Dressure above atmosonenc pressure i 
to the film via a 4 kg flat weignt on a constrained 
inflatable element, such as a plastic bag. 

The membranes were marked with vector DNA 
used in a manner like ink on paper to automatically 
identify the membrane and the probe used by the 
hybridization properties of the DNA markings. 
Markings to identify the probe used consisted of one 
vector per dot or line corresponding to each probe, 
or any appropriate mixture of vectors or fragments 
thereof. Markings to identify the membrane were a 
mixture of all vectors, this will hybridize with all the 
probes used in sequencing procedures. 

Internal standards were also provided in at least 
one lane in the gel. These known internal standards 
gave sequencing patterns which are useful for 
interpretation of unknown sequences. Specifically, 
during digital image processing of the film sequence 
data the prior knowledge of lane and band positions 
and shapes for the sequencing film can speed 
processing for ail subsequent films prepared from 
the same membrane. Prior knowledge of reaction 
chemistry deviations and local gel artifacts seen in a 
standard seauencing reaction also helps in accurate 
estimation of errors for all subsequent films. For 
exampfe. internal standards can be applied to each 
lane and discerned prior to probing with oligonucle- 
otides. One such standard is the product of a 
seauencing reaction on a vector having no insert, 
thus providing a. known DNA sequence. Such known 
DNA sequences provide ideal internal standards 
since the band shaoes, lane shapes, and reaction 
chemistry visible from one probing are congruent 
with those in all subseauent probings. This aids the 
• data reduction steps and the auantative recognition 
of trouble soots. This techniaue can also be used for 
mapping of restriction enzyme sites m DNA mole- 
cules. 

Interpretation of the X-ray films generated by the 
above-process incluaes six steos to transform the 
raw data to an idealized form. The resulting data is 
interpreted to provide a DNA sequence ana esti- 



mates of the liklihood of alternative interpretations 
made. The steps are: a) to'find the boundaries of the 
lanes and straighten them if necessary; b) to adjust 
the interband distances: c) to straighten the bands 
5 across each lane : d) to adjust the interlane displace- 
ments: e) to adjust the band thicknesses: and f) to 
adjust the reaction specificities {i.e., to ensure that ■ 
each band in each of the lanes in a set of reactions is 
a true band). Some of these steos are discussed by 
W Elder et al. 14 Nuc. Acid. Res. 417. 1986. 

Referring to Figure 5. a schematic example of 
band interpretation is provided. DNA seauencing 
reactions from an internal standard (left panel) and 
an unknown DNA sample (second panel from left) 
*5 were placed in the same lanes of a gel and probed 
with appropriate probes. Figure 5 is a schematic of 
the resulting X-ray films. As can be seen from the left 
two panels both the standard and the unknown 
deviate from an ideal result (shown in the right panel) 
to in an identical manner. The DNA sequence of the 
standard is provided on the left of Figure 5. Using 
this sequence it is possible to determine which 
bands on the film represent true bands, rather than 
artifacts. From this analysis the DNA sequence of 
2* the unknown can then also be determined by making 
allowance for artifacts detected in the standard 
sample. Thus, referring to Figure 5. the transforma- 
tion steps described above can be applied. In the 
first step lane curvature is straightened in all four 
30 lanes: in the second step interband distances are 
made equal: in the third step band curvature is. 
straightened: in the fourth step severe interlane 
displacement in the C lane relative to the other three 
lanes is adiusted: in the fifth step thick bands in the 
35 R Ian* are 'made thinner: and in the sixth step false 
bands are removed. After these steps the ideal 
result is determined and thence the DNA sequence 
of the unknown (shown on the right of Figure 5). 
Each adjustment that involves manual or auto- 
40 matic pattern recognition on the entire film data set 
can ^ take about an hour. By using the above 
multiplexing sequencing method with an internal 
standard of known sequence each of the adjust- 
ments can be more accurately determined than for 
45 an unknown sequence, and then applied to each set 
of unknown film data obtained from subseauent 
probings of the same membrane. Thus, one stan- 
dard pattern can be used for a large number of later 
probings of the same membrane. The transformation 
50 requirements' can be memorized by the com D uter 
and recalled as necessary. 

Automated Probing Device 

*5 An automated probing device is useful for auto- 
mation of the visualization of latent multiplexed DNA 
sequences immobilized on nylon membranes, or for 
any other process where multiple cycles of crocing 
are necessary. 

6 0 in. general, after the DNA to be orobed has been 
transferred ana crosslinked-to a membrane suoport. 
eg., a nylon membrane, the nylon membrane is 
heat-sealed into a polyester/polyethylene laminate 
bag such as Scotchoak©. the automated probing 

65 device performs the following steps: a) 100ml of 
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-fehyjnaszation buffer <7<Vo SDS, 10<Vb PEG (MW 
^C0). 0.1 2M sodium phosphate pH 7.2, and 0.25 M 
rsiaCD 'S introducec into the bags and the mem- 
cranes ec-i::bratec: b) this buffer is removed and 
90ml of hycnsization buffer with probe (5pM radio- 
labeled oligonucleotide in prehybridization buffer) is 
introduced: followed oy a 10m! chase of prehybridi- 
zation ourfer: c) the membranes are then incu- 
bated 4 to 15 hours at 42° C: d) wash buffer (io/ 0 
SDS. 70 nM sodium phosphate pH 7.2) is intro- 
duced, incjoated at room temperature for 45 
minutes anc removed; the wash is repeated a 
minimum of 5 times: e) after the last wash. X-ray 
films are piaced over the membranes and exposed. 
To ensure good contact between the film and 
membrane an air bag is inflated directly over the film; 
f) after exposure the air bag is deflated by use of a 
vacuum ano the films processed; g) the probe is 
removed with not stripping buffer (2 mM EDTA.TRIS 
base pH 7.5. 0.2% SDS at 80° C) and the cycle 
repeated with a new probe. 

Referring to Figs. 6 and 7. automated prober 10 is 
formed of a iicnt-tigh: aluminum housing 1 1 , having a 
series of grccves 20 for slideably accepting a series 
of shelves "5. Shelves 18 are spaced apart about 
0.4" to allow a re-usable pouch 12 formed from 
plastic of thicKness 0.55 micron, an X-ray film (not 
shown) ana an air bag 16 to be sandwiched between 
each shelf 18. Housing 11 has a height A of 10.75", a 
width B of 20" and a depth C of 24". Each pouch 12 is 
provided with an inlet pipe 22 and an outlet pipe 24, 
and the housing is heated by heater 26. A door (not 
shown) of thickness 0.25" is provided with a 0.125" 
neoprene light-tight gasket seal and a latch. Auto- 
mated prober 10 is assembled and manufactured by 
standard procedure. Light-tight construction of 
automated prober "0 allows exposures to be 
performed outside of a photographic darkroom: and 
aluminum housing 11 provides efficient attenuation 
of beta pantcies from 32 P decay, with ■ minimal 
secondary X-ray formation. 

Re-usaDie couches 12 allow tight contact with the 
X-ray film to oe exposed, without need for pouch 
disassembly. Membranes 14 are fixed (by heat 
tacking of corners) inside each pouch 12 to prevent 
movement and creasing of membranes 14. In 
addition to membranes for multiplex sequencing the 
automated probing device is capable of accommo- 
dating membranes from standard restriction digest 
transfers, ana dot and slot blots used in diagnostics. 
The horizontal format allows uniform spatial distribu- 
tion of a prooe. without need for mesh supports. The 
rigid bookshelf type structure allows air bags 16 to 
expand between shelves 18 to ensure tight contact 
between eacn X-ray film and an adjacent membrane 
14. Each shelf 18 easiiy slides out of grooves 20 for 
membrane replacement. 

A computer (not snown) controls temperature, 
liquid, aw. ana vacuum ficw. Only the X-ray films and 
vessels containing liquid input and output have to be 
manually changea. 

Also proviaec is a heated, ventilated output waste 
vessel 30 wmch allows 1 00-fold concentration of 
liquid wastes, thereby reducing disposal costs. 

Automated crober 10 is constructed as repeating 



shelf units 42 of ten membranes, each repeat of ten 
membranes having dedicated reservoirs and valves 
for service buffers including hybridization buffer, 
probe, wash solution and stripping buffer. Each 

5 individual membrane has its own input valve. The 
waste output valves are placed in series with a 
12VDC-controlled Gorman-Rupp reciprocating 
pump; the buffer input valves are also ofaced in 
series via a 12VDC-controlled Gorman-Rupp reci- 

W procating pump. Air input is used to inflate air bags 
for X-ray film exposures. These air bags may be 
covered by conductive foam to enhance membrane 
contact. A house vacuum is used to evacuate the 
airbags after exposure. During probing the tempera- 

?5 ture is maintained at 42° C by heating mats 26 placed 
inside the top and bottom of the apparatus. 

Computer control is by TRS-80 Model 102 which 
controls the valves via a CIP/35 serial controller 
(SIAS Engineering). A BASIC program controls all 

20 buffer inputs and output by timing loops; thus it is 
important to calibrate the flow rates before setting 
parameters. Parameters are set by creating a' text 
file that can be opened and reac by the BASIC 
program. The input file has lines that nave a numoer 

25 as the first element followed by tab before any 
descriptors are used. The line order is: 

1. - membrane number (1-10) 

2. - repeat units (1-13) 

3. - minimum membrane number 
30 4. - maximum membrane number 

5. - prehyb input time (min) 

6. - prehyb incubate time (min) 

7. - prehyb output time (min) 

8. - probe input time (min) 
35 9. - probe chase time (min) 

10. - probe incubation time (hours) 

11. - probe output time (min) 

12. - wash input time (min) 

13. - wash incubation time (mini 
40 14. - wash output time (min) 

15. - number of washes (integer,! 

16. - strip input time (min) 

17. - strip incubate time (min) 

18. - strip output time (min) 

45 Other embodiments are within the following 
claims. 



50 Claims 



1 . A set comprising at least twc vectors, eacn 
said vector of said set characterised by having a 

55 DNA construct having at least or.e restriction 

endonuclease site and having at ieast one tac 
sequence comprising at least 'z case cairs. 
each taa sequence being located w.tnm 50- 
bases of one said site, each saic tag seauence 

SO being incapable of hybridizing v.r.r. another 

said tag sequences under stringent nyc-nciza- 
tion conditions. 

each said vector of said set differing from eacn 
• other said vector of said set only at said tag 
55 sequence. 
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2. A vector for sequencing genomic DNA of 
an organism, said vector characterised by a 
DNA construct having at least two restriction 

* endonuclease sites, each said site being recog- 
nised by a first- restriction endonuclease. 5 
wherein treatment of said vector with said first 
endonuclease proauces a discrete DNA frag- 
ment consisting of the DNA extending between 
said sites comprising two tag sequences separ- 
ated from each other by a second restriction :c 
endonuclease site, wherein said tag sequences 
each comprise at least 15 bases and each end 
of said tag sequences is located within 50 bases 
of the nearest satd first and second restriction 
site, said tag sequences being unique in saia ?5 
vector. 

3. A set as claimed in claim 1. wherein said 
tag sequences have no cytosine residues in one 
stFand. 

3. A vector as claimed in claim 2. wherein said 20 
tag sequences have no cytosine residues in one 
strand. 

5. A set as claimed in claim 1 wherein said 
vectors are proauced from a parental vector 
having randomly formed tag seauences ligatec 25 
into said parental vector. 

6. A set as claimed in claim 5 wherein saia 
randomly formea sequences are formed in a 
DNA synthesizer supplied with at least two 
nucleotides at each addition step. 3° 

7. A set of tag oligonucleotides, charac- 
terisea in that each said tag oligonucleotide of 
satd set is unique in said set and neither said tag 
oligonucleotide nor oligonucleotide homolo- 
gous to said tag oligonucleotide will hybridize 35 
under stringent hybridization conditions to 
other said tag oligonucleotides in said set. 
wherein each tag oligonucleotide is at least 15 
bases in length. 

8. A set of vectors produced by ligation into a 40 
parental vector of the set of tag oligonucleo- 
tides as claimed in claim 7. 

9. A set as claimed in any one of claims 1. 3. 
5/6 or 7 wherein said hybridization conditions 
comprise conditions in which mismatched du- 45 
plexes are melted to form single stranded 
molecules. 

10, A set as claimed in claim 9 wherein said 
hybridization conditions comprise heating at 
42 : C in 500-1000 mM sodium phosphate buffer. 50 

1 1 . A method for sequencing a DNA specimen, 
said method cnaracterised by the steps of: 

a} ligating said DNA specimen into a 
vector comprising a tag sequence, said tag 
sequence comprising at least 15 bases. 55 
wherein* said tag sequence will not hy- 
bridize to said DNA specimen under strin- 
gent hybnaization conditions ana is unique 
in saia vector, to form a nyond vector. 

b) treating separate- anquots of saic 50 
hybrid vector in a plurality of vessels tc 
produce fragments comprising said tag 
sequence, wherein said fragments in eacn 
vessel differ in length from each other and 
all terminate at a fixed known base or 65 



bases, wherein said fixed known base or 
bases differ from those in other said 
vessels. 

c) separating said fragments from each 
said vessel according to their size. 

d) hybridizing said fragments with an 
oligonucleotide able to hybridize specifi- 
cally with said tag sequence, and 

e) detecting the pattern of hybridization 
of said tag sequence, wherein said pattern 
reflects the nucleotide sequence of said 
DNA specimen. 

12 A method as claimed in claim 11 further 
comprising ligating said DNA to a plurality of 
said vectors to form a plurality of said hybrid 
vectors. 

13. A method as claimed in claim 12 wherein 
each said vector differs at- said tag sequence 
and each said tag sequence is unaDle to 
hybridize under stringent hybridization condi- 
tions to other said tag sequences. 

14. A method as claimed in claim 11 further 
comprising rehybridizing said fragments with 
each said tag sequence of each said vector . 

15. A method as claimed in ciaim 11. wnerein 
eacn said vector comprises two said tag 
sequences. 

16. A method as claimed in claim 11 further 
comprising binding said fragments to a solid 
support prior to said hybridizing. 

17. A method as claimed in ciaim 1 1 or ciaim 13 
wherein said stringent hybridization conditions 
comprise conditions in which mismatched du- 
plexes are melted to form single stranded 
molecules. 

18. A method for repeatedly hybridizing a solid 
support, comprising nucleic acid, with a plu- 

■ raiity of labels, characterised by me steps of: 

a) enclosing said support within a con- 
tainer comprising material thin enougn to 
allow detection of hybnaization of said 
nucleic acid with said labels. 

b) inserting hybridization fluid compris- 
ing a first said label into said container. 

c) removing said hybridization fluid from 
said container, and 

d) detecting hybridization of said first 
label to said solid support, while said solia 
support remains in said container 

19. A methoa as claimed in ciaim 18. further 
comprising after said detecting s;eo. repealing 
steps b.c and d using a second said label. 

20. A method as claimed in ciaim 18. wherein 
steDS b.c and d are repeated at least 20 times 
with at least 20 said labels. 

21. A method as claimed in c:aim 11. further 
comprising prior to said separating steo. the 
steps of providing a molecular weignt maimer in 
a said vessel ana detecting sac markers after 
said separating step. 

22. A method as claimed m ciaim 16 wherein 
said solid support comprises saia vector or said 
oligonucleotide, wherein satd vector or oligonu- 
cleotide act as an identifying marker for said 
support. 
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23. A method for determining the DNA se- 
quence of a DNA specimen characterised by 
the steps of: 

a) performing a first sequencing reaction 

on said DNA specimen; 5 

b) performing a second sequencing 
reaction on a known DNA specimen: 

c) placing corresponding products of 
said first and second sequencing reactions 

in the same lanes of a DNA sequencing 10 
gel: 

d) running the products into said gel: 

e) detecting the location of said pro- 
ducts in said gel, and 

f) using the location of said known DNA 15 
products to aid calculation of the DNA 
sequence of said DNA specimen. 

24. An automated hybridization device, for 
repeatedly hybridizing a solid support compris- 
ing nucleic acid, with a plurality of labels, 20 
characterised by: 



25 



a container enveloping said solid support, 
comprising material thin enough to allow detec- 
tion of hybridization of said nucleic acid with a 
said label by placement of a label-sensitive film 
adjacent said container. 

said container including an inlet and outlet for 
introducing and withdrawing hybridization fluid, 
and comprising means for automatically regu- 
lating introduction and removal of said hybridi- 
zation fluid. 

25. An automated hybridization device as 
claimed in claim 24 comprising inflatable means 
for contacting said container with said film. 

26. An automated hybridization device as 
claimed in claim 24 or 25 comprising a lightproof 
box enveloping said container. 

27. An automated hybridization device as 
claimed in claim 24 or 25 comprising a plurality 
of s'aid containers, each separated by a label- 
opaque shelf. 
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FIG. 3 

01 P CCTTTCATTACAACTAATCA TAACAT. 

02 P CATAAATTATTCTACTTATA AAAAATA 
0 3 P CAACTCCTATCAACTCTACA CTTACTC 
°o P CAATATTTAAACCTCACACT TCCAATA 
0 5 P CATAATATAACCTAAACCCT AAATCTT 
66 P CCACATCCAAAATAATCAAT CAACATA 

07 P CTACTAAATTTCCTTTATAA TCCCC«A. 

0 08 P CCCCTCCAATATAATATATA ATTACAT 
0 9 P CATTAACAATCATACCACTA CCAAATA 
10 P CCTAATCATCAATATACTCA ATACArtC 
U P CATCACCACACATATTATCC TTATCCT 

12 P CTAATCTATTAACCACTTAA AATACTT 

13 P CAACTTACTCTACACCCCTT TTATCAC 

1 4 p CTCTTCATAATATAATTTAC TACTCTT 
J? p CAAACCAACATTTAACACAA TATATCA 
16 P CATAAACACCCATTCATCCA ACTCTTA 
{? P CTCACCTTCTTAAAACCCAA ATTCTCT 
IB P CAAATCTACTTCCAACCACT ATTCAAA 

15 P CAACCAACTCTACTTATCAC TCCCAAC 
20 P CCCCACCAAATTACAACTAC AATACTC 

00 P CCTTCCTAAACCACACTCCA TTTAACC 

01 E CCCCCAATAAAATCATACTA CCTTAT 

02 E CTTTTACACAATAACTCTTA CTCAAA 

63 E CTAACAACAAACCTTACTAC ATCCTA 

64 E CAACACCCATCCACTAAACT TAAACA 

05 E CACATAACTCAAATCTCAAA TTC«CC 

06 E CACCCCATATCAATTTACAA TAATCT 
6? i CCATAAACTCCTCATCTTCC CCACTC 

CCTCTTCCCATTTTTATTAT TACTTC 

09 E CTATTCACAATACACCACCA CAATCC 

10 E CACCTAATACCCTATATATA ATACCA 

11 E CCACCTTTATACTTCTTACA ACTC^T 

12 E CCCTTCTACCAACCTATATC ATTTAT 
• \] E CAAAAACTAATTCCCAAAAA ATCCTC 

14 E CTTTCTTTTCTCAACCCCTT C«CCC« 

15 E CTCACTAAATCTTATCTTAC TTTACA 

16 E CTTACAATCCCCTATCATAA CATTAT 

17 E CCACCATCTTCACCCCCCAA TTTAAT 
\l I CAACCACAACCAACCCTACT CCCCTA 
\l I CCAACCCTACATTAACTTCT ATCATC 
20 E CCATAACTTTAACCTTTnAC ATT««T 
00 E CTTCCTCTTTATCTT.^Tx^ *C*TT <. 
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